Generalized Centroid Estimators in Bioinformatics
نویسندگان
چکیده
In a number of estimation problems in bioinformatics, accuracy measures of the target problem are usually given, and it is important to design estimators that are suitable to those accuracy measures. However, there is often a discrepancy between an employed estimator and a given accuracy measure of the problem. In this study, we introduce a general class of efficient estimators for estimation problems on high-dimensional binary spaces, which represent many fundamental problems in bioinformatics. Theoretical analysis reveals that the proposed estimators generally fit with commonly-used accuracy measures (e.g. sensitivity, PPV, MCC and F-score) as well as it can be computed efficiently in many cases, and cover a wide range of problems in bioinformatics from the viewpoint of the principle of maximum expected accuracy (MEA). It is also shown that some important algorithms in bioinformatics can be interpreted in a unified manner. Not only the concept presented in this paper gives a useful framework to design MEA-based estimators but also it is highly extendable and sheds new light on many problems in bioinformatics.
منابع مشابه
Prediction of RNA secondary structure using generalized centroid estimators
MOTIVATION Recent studies have shown that the methods for predicting secondary structures of RNAs on the basis of posterior decoding of the base-pairing probabilities has an advantage with respect to prediction accuracy over the conventionally utilized minimum free energy methods. However, there is room for improvement in the objective functions presented in previous studies, which are maximize...
متن کاملOn the Maximum Likelihood Estimators for some Generalized Pareto-like Frequency Distribution
Abstract. In this paper we consider some four-parametric, so-called Generalized Pareto-like Frequency Distribution, which have been constructed using stochastic Birth-Death Process in order to model phenomena arising in Bioinformatics (Astola and Danielian, 2007). As examples, two ”real data” sets on the number of proteins and number of residues for analyzing such distribution are given. The co...
متن کاملImproved Measurements of RNA Structure Conservation with Generalized Centroid Estimators
Identification of non-protein-coding RNAs (ncRNAs) in genomes is a crucial task for not only molecular cell biology but also bioinformatics. Secondary structures of ncRNAs are employed as a key feature of ncRNA analysis since biological functions of ncRNAs are deeply related to their secondary structures. Although the minimum free energy (MFE) structure of an RNA sequence is regarded as the mos...
متن کاملAdmissible Estimators of ?r in the Gamma Distribution with Truncated Parameter Space
In this paper, we consider admissible estimation of the parameter ?r in the gamma distribution with truncated parameter space under entropy loss function. We obtain the classes of admissible estimators. The result can be applied to estimation of parameters in the normal, lognormal, pareto, generalized gamma, generalized Laplace and other distributions.
متن کاملGeneralized Family of Estimators for Imputing Scrambled Responses
When there is a high correlation between the study and the auxiliary variables, the rank of the auxiliary variable also correlates with the study variable. Then, the use of the rank as an additional auxiliary variable may be helpful to increase the efficiency of the estimator of the mean or total of the population. In the present study, we propose two generalized familie...
متن کامل